Dictionary acquisition using parallel text and co-occurrence statistics

نویسندگان

  • Christian Biemann
  • Uwe Quasthoff
چکیده

We present a simple and efficient approach for deriving bilingual dictionaries from sentence-aligned parallel text by extending the notion of co-occurrences to a cross-lingual setting. Dictionaries are evaluated against gold standards and manually; the analysis accounts for frequency and corpus size effects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Methodology for Bilingual Lexicon Extraction from Comparable Corpora

Dictionary extraction using parallel corpora is well established. However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in sev...

متن کامل

Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus

We propose a novel context heterogeneity similarity measure between words and their translations in helping to compile bilingual lexicon entries from a non-parallel English-Chinese corpus. Current algorithms for bilingual lexicon compilation rely on occurrence frequencies, length or positional statistics derived from parallel texts. There is little correlation between such statistics of a word ...

متن کامل

Term-list Translation Using Mono-lingual Word Co-occurrence Vectors

A term-list is a list of content words that characterize a consistent text or a concept. This paper presents a new method for translating a term-list by using a corpus in the target language. The method rst retrieves alternative translations for each input word from a bilingual dictionary. It then determines the most`coherent' combination of alternative translations , where the coherence of a s...

متن کامل

Extracting Bilingual Collocations from Non-Aligned Parallel Corpora

This paper proposes a new method to find correspondences of uninterrupted collocations from Japanese-English bilingual corpora without sentence-to-sentence alignment. Uninterrupted collocations in English such as “once again”, “give up”, or “gross national product” handled as a single word or a compound word in Japanese, can be automatically extracted with corresponding Japanese words using wor...

متن کامل

Query Term Disambiguation Using Co-occurrence Statistics for Dictionary based Cross Lingual Information Retrieval

Query translation in cross lingual information retrieval can be done using machine translation, parallel corpora or machine readable dictionary. The technique which is most cost effective and less time consuming wins the major votes. Working on this line many researchers opt for machine readable dictionaries which are easily available. Dictionaries usually provide more than one translations in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005